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Abstract — Decision processes with incomplete state feed- 
back have been traditionally modeled as Partially Observ- 
able Markov Decision Processes. In this paper, we present 
an alternative formulation based on probabilistic regular 
languages. The proposed approach generalizes the recently 
reported work on language measure theoretic optimal con- 
trol for perfectly observable situations and shows that such 
a framework is far more computationally tractable to the 
classical alternative. In particular, we show that the infi- 
nite horizon decision problem under partial observation, 
modeled in the proposed framework, is A-approximable and, 
in general, is no harder to solve compared to the fully 
observable case. The approach is illustrated via two simple 
examples. 

Index Terms — POMDP; Formal Language Theory; Partial 
Observation; Language Measure; Discrete Event Systems 

1. Introduction & Motivation 

Planning under uncertainty is one of the oldest and most 
studied problems in research literature pertaining to auto- 
mated decision making and artificial intelligence. The cen- 
tral objective is to sequentially choose control actions for one 
or more agents interacting with the operating environment 
such that some associated reward function is maximized for 
a pre-specified finite future (finite horizon problems) or for 
all possible futures (infinite horizon problems). Among the 
various mathematical formalisms studied to model and solve 
such problems, Markov Decision Processes (MDPs) have 
received significant attention. A brief overview of the current 
state of art in MDP-based decision theoretic planning is 
necessary to place this work in appropriate context. 

1.1. Markov Decision Processes 

MDP models [26], [34] extend the classical planning frame- 
work [21], [24], [25], [18] to accommodate uncertain effects 
of agent actions with the associated control algorithms at- 
tempting to maximize expected reward and is capable, in 
theory, of handling realistic decision scenarios arising in op- 
erations research, optimal control theory and, more recently, 
autonomous mission planning in probabilistic robotics [1]. 
In brief, a MDP consists of states and actions with a set of 
action-specific probability transition matrices allowing one 
to compute the distribution over model states resulting from 
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the execution of a particular action sequence. Thus the 
endstate resulting from an action is not known uniquely 
apriori. However the agent is assumed to occupy one and 
only one state at any given time, which is correctly observed, 
once the action sequence is complete. Furthermore, each 
state is associated with a reward value and the performance 
of a controlled MDP is the integrated reward over specified 
operation time (which can be infinite). A partially observable 
Markov decision process (POMDP) is a generalization of 
MDPs which assumes actions to be nondeterministic as in 
a MDP but relaxes the assumption of perfect knowledge of 
the current model state. 

A policy for a MDP is a mapping from the set of states 
to the set of actions. If both sets are assumed to be finite, 
the number of possible mappings is also finite implying that 
an optimal policy can be found by conducting search over 
this finite set. In a POMDP, on the other hand, the current 
state can be only estimated as a distribution over underlying 
model states as a function of operation and observation 
history. The space of all such estimations or belief states is 
a continuous space although the underlying model has only 
a finite number of states. In contrast to MDPs, a POMDP 
policy is a mapping from the belief space to the set of actions 
implying that computation of the optimal policy demands 
a search over a continuum making the problem drastically 
more difficult to solve. 

1.2. Negative Results Pertaining to POMDP Solution 

As stated above, an optimal solution to a POMDP is 
a policy which specifies actions to execute in response to 
state feedback with the objective of maximizing performance. 
Policies may be deterministic with a single action specified 
at each belief state or stochastic which specify an allowable 
choice of actions at each state. Policies can be also cate- 
gorized as stationary, time dependent or history dependent; 
stationary policies only depend on the current belief state, 
time dependent policies may vary with the operation time 
and history dependent policies vary with the state history. 
The current state of art in POMDP solution algorithms [35], 
[6] are all variations of Sondick's original work [32] on 
value iteration based on Dynamic Programming (DP). Value 
iterations, in general, are required to solve large numbers of 
linear programs at each DP update and consequently suffer 
from exponential worst case complexity. Given that it is hard 
to find an optimal policy, it is natural to try to seek one that 
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TABLE I 

A-Approximability Of Optimal POMDP Solutions 



Policy 


Horizon 


Approximability 


Stationary 


K 


Not unless P=NP 


Time-dependent 


K 


Not unless P=NP 


Histpry-dependent 


K 


Not unless P=PSPACE 


Stationary 


oo 


Not unless P=NP 


Time-dependent 


oo 


Uncomputable 



is good enough. Ideally, one would be reasonably satisfied 
to have an algorithm guaranteed to be fast which produces 
a policy that is reasonably close (A-approximation) to the 
optimal solution. Unfortunately, existence of such algorithms 
is unlikely or, in some cases, impossible. Complexity results 
show that POMDP solutions are nonapproximable [4], [19], 
[20] with the above stated guarantee existing in general 
only if certain complexity classes collapse. For example, 
the optimal stationary policy for POMDPs of finite state 
space can be A-approximated if and only if P=NP. Table U 
reproduced from [19] summarizes the known complexity 
results in this context. Thus finding the history dependent 
optimal policy for even a finite horizon POMDP is PSPACE- 
complete. Since this is a broader problem class than NP, the 
result suggests that POMDP problems are even harder than 
NP-complete problems. Clearly, infinite horizon POMDPs 
can be no easier to solve than finite horizon POMDPs. In 
spite of recent development of new exact and approximate 
algorithms to efficiently compute optimal solutions [6] and 
machine learning approaches to cope with uncertainty [16], 
the most efficient algorithms to date are able to compute 
near optimal solutions only for POMDPs of relatively small 
state spaces. 

1.3. Probabilistic Regular Language Based Models 

This work investigates decision-theoretic planning under 
partial observation in a framework distinct from the MDP 
philosophy. Decision processes are modeled as Probabilistic 
Finite State Automata (PFSA) which act as generators of 
probabilistic regular languages [11]. 

It is important to note that the PFSA model used 
in this paper is conceptually very different from the 
notion of probabilistic automata introduced by Ra- 
bin, Paz and others [27], [23] and essentially follows 
the formulation of p-language theoretic analysis first 
reported by Garg etal. [14], [13]. 

The key differences between the MDP framework and PFSA 
based modeling can be enumerated briefly as follows: 
1) In both MDP and PFSA formalisms, we have the notion 
of states. The notion of actions in the former is analogous 
to that of events in the latter. However, unlike actions 
in the MDP framework, which can be executed at will 
(if defined at the current state), generation of events 
in the context of PFSA models, is probabilistic. Also, 



such events are categorized as being controllable or 
uncontrollable. A controllable event can be "disabled" so 
that state change due to generation of that particular 
event is inhibited; uncontrollable events, on the other 
hand, cannot be disabled in this sense. 

2) For a MDP, given a state and an action selected for exe- 
cution, we can only compute the probability distribution 
over model states resulting from the action; although 
the agent ends up in an unique state due to execution 
of the chosen action, this endstate cannot be determined 
apriori. For a PFSA, on the other hand, given a state, we 
only know the probability of occurrence of each alphabet 
symbol as the next to-be generated event each of which 
causes a transition to a apriori known unique endstate; 
however the next state is still uncertain due to the pos- 
sible execution of uncontrollable events defined at the 
current state. Thus, both formalisms aim to capture the 
uncertain effects of agent decisions; albeit via different 
mechanisms. 

3) Transition probabilities in MDPs are, in general, func- 
tions of both the current state and the action executed; 
i.e. there are m transition probability matrices where 
m is the cardinality of the set of actions. PFSA models, 
on the other hand, have only one transition probability 
matrix computed from the state based event generation 
probabilities. 

4) It is clear that MDPs emphasize states and state- 
sequences; while PFSA models emphasize events and 
event-sequences. For example, in POMDPs, the observa- 
tions are states; while those in the observability model 
for PFSAs (as adopted in this paper) are events. 

5) In other words, partial observability in MDP directly 
results in not knowing the current state; in PFSA models 
partial observability results in not knowing transpired 
events which as an effect causes confusion in the deter- 
mination of the current state. 

This paper presents an efficient algorithm for computing the 
history-dependent [19] optimal supervision policy for infinite 
horizon decision problems modeled in the PFSA framework. 
The key tool used is the recently reported concept of a 
rigorous language measure for probabilistic finite state lan- 
guage generators [9]. This is a generalization of the work 
on language measure-theoretic optimal control for the fully 
observable case [12] and we show in this paper, that the 
partially observable scenario is no harder to solve in this 
modeling framework. 

The rest of the organized in five additional sections and 
two brief appendices. Section |2] introduces the preliminary 
concepts and relevant results from reported literature. Sec- 
tion |3] presents an online implementation of the language 
measure-theoretic supervision policy for perfectly observable 
plants which lays the framework for the subsequent devel- 
opment of the proposed optimal control policy for partially 
observable systems in Section [4] The theoretical develop- 
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Fig. 1. Comparison of modeling semantics for MDPs and PFSA 

merit is verified and validated in two simulated examples 
in Section \E\ The paper is summarized and concluded in 
Section [6] with recommendations for future work. 

2. Preliminary Concepts & Related Work 

This section presents the formal definition of the PFSA 
model and summarizes the concept of signed real measure 
of regular languages; the details are reported in [29] [30] [9]. 
Also, we briefly review the computation of the unique max- 
imally permissive optimal control policy for probabilistic 
finite state automata (PFSA) [12] via maximization of the 
language measure. In the sequel, this measure-theoretic 
approach will be generalized to address partially observable 
cases and is thus critical to the development presented in 
this paper. 

2.1. The PFSA Model 

Let Gi = (Q, 1, 6, qt, Q m ] be a finite-state automaton model 
that encodes all possible evolutions of the discrete-event 
dynamics of a physical plant, where Q = {q k : k e 3q] is 
the set of states and 3q = {1 ,2, • ■ ■ ,u} is the index set of 
states; the automaton starts with the initial state q t ; the 
alphabet of events is I = {o\ : k e 3 L ], having Z |~|3q = and 



3j: = {1,2, •••,£} is the index set of events; 6 : Q x I — » Q 
is the (possibly partial) function of state transitions; and 
Qm = {qm,,qm 2 ,--- ,1m,l £ Q is the set of marked (i.e., 
accepted) states with q m|c = q ; for some j e 3q. Let Z* be 
the Kleene closure of Z, i.e., the set of all finite-length strings 
made of the events belonging to Z as well as the empty string 
e that is viewed as the identity of the monoid Z* under the 
operation of string concatenation, i.e., es = s = se. The state 
transition map 6 is recursively extended to its reflexive and 
transitive closure 6 : Q x Z* — > Q by defining 

Vq, e Q, 6(q i ,e)=q j (la) 
Vqj e Q,o-e Z,s e I*, Sfq^o-s) = 6(6(q i ,a),s) (lb) 

Definition 2.1: The language L(q t ) generated by a DFSA G 
initialized at the state qt e Q is defined as: 

L(q t )={sel* |6*(q t ,s)eQ} (2) 

The language L m (q t ) marked by the DFSA G initialized at 
the state q t e Q is defined as: 

L m ( qi )={seZ* | 6*(qi,s) e Q m } (3) 
Definition 2.2: For every q ; e Q, let Lfq^qj) denote the set 
of all strings that, starting from the state q it terminate at the 
state qj, i.e., 

Ly={ser | 6*(q l ,s) = q j e Q} (4) 
To complete the specification of a probabilistic finite state 
automata, we need to specify the event generation proba- 
bilities and the state characteristic weight vector; which we 
define next. 

Definition 2.3: The event generation probabilities are spec- 
ified by the function n : Q x Z* — > [0, 1] such that Vqj e Q, Vo\ e 
Z.Vs e E*, 

(1) n(q u o x ) = ft jk e [0,1); £ k n jk = 1- 6, with 6 e (0,1); 

(2) 7t( q 5 , cr) = if &{q u a) is undefined; n{q h e) = 1; 

(3) 7r(qj,cr k s) =7t(qj,ff k ) #r(6(q J , cr k ), s). 

Notation 2.1: The n x I event cost matrix FT is defined as: 
ffkj = 7t(q t , o-j) 

Definition 2.4: The state transition probability n : Q x Q — ► 
[0, 1), of the DFSA G t is defined as follows: 

Vqi,qj e Q.Ttij = nfqi.cr) (5) 

creZ s.t. b[q i ,o) = q i 

Notation 2.2: The n x n state transition probability matrix 
fl ts defined as I7|tj =7t(q i ,q j ) 

The set Q m of marked states is partitioned into Q+ and 
Q" , i.e., Q m = Q+ U Q~ and Q+ n Q~ = 0, where Q+ 
contains all good marked states that we desire to reach, and 
contains all bad marked states that we want to avoid, 
although it may not always be possible to completely avoid 
the bad states while attempting to reach the good states. 
To characterize this, each marked state is assigned a real 
value based on the designer's perception of its impact on the 
system performance. 
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Definition 2.5: The characteristic function x : Q — » [—1,1] 
f/iai assigns a signed real weight to state-based sublanguages 
L(q t ,q) is defined as: 

[ [-1,0), qeQ" 
VqeQ, x(q)G< {0}, q£Q m (6) 
1 (0,11, qeQ+ 

The state weighting vector, denoted by x = bit Xi ■■■ Xnl T , 
where Xj = X(lj) Vj e 3 Q , is called the x-vector. The )-th ele- 
ment X) of x-vector is the weight assigned to the corresponding 
terminal state qj. 

Remark 2.1: The state characteristic function x '■ Q — > 
[—1,1] or equivalently the characteristic vector x is analo- 
gous to the notion of the reward function in MDP analysis. 
However, unlike MDP models, where the reward (or penalty) 
is put on individual state-based actions, in our model, the 
characteristic is put on the state itself. The similarity of 
the two notions is clarified by noting that just as MDP 
performance can be evaluated as the total reward garnered as 
actions are executed sequentially, the performance of a PFSA 
can be computed by summing the characteristics of the states 
visited due to transpired event sequences. 
Plant models considered in this paper are deterministic finite 
state automata (plant) with well-defined event occurrence 
probabilities. In other words, the occurrence of events is 
probabilistic, but the state at which the plant ends up, 
given a particular event has occurred, is deterministic. No 
emphasis is laid on the initial state of the plant i.e. we 
allow for the fact that the plant may start from any state. 
Furthermore, having defined the characteristic state weight 
vector x, it is not necessary to specify the set of marked 
states, because if Xi = 0, then q t is not marked and if Xi + 0, 
then q t is marked. 

Definition 2.6: (Control Philosophy) If q t — > q k , and the 
event a is disabled at state q„ then the supervisory action is 
to prevent the plant from making a transition to the state q k , 
by forcing it to stay at the original state q t . Thus disabling 
any transition a at a given state q results in deletion of the 
original transition and appearance of the self-loop 6(q, a) = q 
with the occurrence probability of a from the state q remain- 
ing unchanged in the supervised and unsupervised plants. 

Definition 2.7: (Controllable Transitions) For a given 
plant, transitions that can be disabled in the sense of Def- 
inition \2.6\ are defined to be controllable transitions. The 
set of controllable transitions in a plant is denoted c £. Note 
controllability is state-based. 

It follows that plant models can be specified by the sextu- 
plet: 

G = (Q,r,6,n,x,¥) (7) 

2.2. Formal Language Measure for Terminating Plants 

The formal language measure is first defined for ter- 
minating plants [14] with sub-stochastic event generation 



probabilities, i.e., the event generation probabilities at each 
state summing to strictly less than unity. In general, the 
marked language L m (q t ) consists of both good and bad event 
strings that, starting from the initial state qi, lead to Q+ and 
Q~ respectively. Any event string belonging to the language 
L°(q t ) = L(qO — L m (qO leads to one of the non-marked states 
belonging to Q - Q m and L° does not contain any one of the 
good or bad strings. Based on the equivalence classes defined 
in the Myhill-Nerode Theorem [17], the regular languages 
L(q t ) and L m (q t ) can be expressed as: 

L(q0= U L ^ (8) 
qk£Q 

L m (q t )= |J L t , k = L+UL- (9) 
qkeQm 

where the sublanguage L i k C L(q t ) having the initial state q L 
is uniquely labelled by the terminal state q k , k e 3q and Ly n 
U, k = Vj + k; and L+ = U qk€Q + U.k and L~ = U qkeQm U,k 
are good and bad sublanguages of L m (q i ), respectively. Then, 
L° = U„ kWm U, k and L(qO = L° U L+ U L" . 

A signed real measure \i x : 2 L,qi) — > R = (— oo,+oo) is 
constructed on the cr-algebra 2 L(qi) for any i e 3 Q ; interested 
readers are referred to [29] [30] for the details of measure- 
theoretic definitions and results. With the choice of this cr- 
algebra, every singleton set made of an event string s e L(q t ) 
is a measurable set. By Hahn Decomposition Theorem [31], 
each of these measurable sets qualifies itself to have a 
numerical value based on the above state-based decompo- 
sition of L(q t ) into L°(null), L + (positive), and L"(negative) 
sublanguages. 

Definition 2.8: Let cu e L(q t , qj) C 2 L(t,i) . The signed real 
measure of every singleton string set {cu} is defined as: 

LL t ({tu})=ft(qi,tu)x(q j ) (10) 

The signed real measure of a sublanguage Ly c L(q0 is 
defined as: 

Lc lJ = u. i (L(q i ,q j ))= I £ «(q t ,tu)jXi UD 

yujeUqi.qj) J 

Therefore, the signed real measure of the language of a 
DFSA Gi initialized at qi e Q, is defined as 

u. i = u. l (L(q l ))= X M. l (U.j) d2) 

j63 Q 

It is shown in [29] [30] that the language measure in Eq. 
< Tl2l > can be expressed as 

ie3 Q 

The language measure vector, denoted as li = 
[u., u-2 ■ ■ • (^nl T , is called the fi-vector. In vector form, 
Eq. ( [TBI becomes 

ci = nn + x (14) 
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whose solution is given by 



n = (i-nr 1 x 



(15) 



The inverse in Eq. exists for terminating plant mod- 
els [14][13] because TT is a contraction operator [29] [30] due 
to the strict inequality < 1 . The residual 9 t = 1 — 

is referred to as the termination probability for state q t e Q. 
We extend the analysis to non-terminating plants [14][13] 
with stochastic transition probability matrices (i.e. with t = 
0, Vqi e Q) by renormalizing the language measure [9] with 
respect to the uniform termination probability of a limiting 
terminating model as described next. 

Let TT and TT be the stochastic event generation and 
transition probability matrices for a non-terminating plant 
Gt = (Q, 1, 5, q t , Q m ). We consider the terminating plant G t (9) 
with the same DFSA structure (Q, I, 5, q t , Q m ) such that the 
event generation probability matrix is given by (1 — 8)TT with 
8 e (0,1 ) implying that the state transition probability matrix 
is (1 -9)TT. 

Definition 2.9: (Renormalized Measure) The renormalized 
measure vj, : 2 L(qi) — > [—1,1] for the d-parametrized termi- 
nating plant Gi(Q) is defined as: 



VtoeL(qO, V e ({cu}) = en i ({o)}) 
The corresponding matrix form is given by 

•v e = 9 n = 6 [I-(1 -8)n]-' X with 8 e (0,1) 



(16) 



(17) 



We note that the vector representation allows for the following 
notational simplification 



v^(L(q t )) = v B |. 



(18) 



The renormalized measure for the non-terminating plant G t 
is defined to be lim vl. 

The following results are retained for the sake of complete- 
ness. Complete proofs can be found in [9][7]. 

Proposition 2.1: The limiting measure vector ~v = 
lim e ^o+ "Ve exists and \\vo\\x ^ 1- 

Proposition 2.2: Let TT be the stochastic transition matrix 
of a non-terminating PFSA [14], [13]. Then, as the parameter 
8 — > + , the limiting measure vector is obtained as: v = C(TT)x 

1 k_1 

where the matrix operator S(TT) = lim - Y~ TT' is the Cesaro 

k^oo k * — 

limit [2], [3] of the stochastic transition matrix TT. 

Corollary 2.1: ( to Proposition \2~2) The expression e(TT)v e is 
independent of 8. Specifically, the following identity holds for 
all 8 e (0,1). 

e(TT)v e = e(TT)x (19) 
Notation 2.3: The linearly independent orthogonal set {v 1 e 
R Card(Q) : v> = is denoted as 25 where 6^ denotes the 
Krbnecker delta function. We note that there is a one-to-one 
onto mapping between the states qt e Q and the elements of 
B, namely, 

1 if k = i 
otherwise 



Definition 2.10: For any non-zero vector v e jrCard(Q)^ ^ e 
normalizing function JV : R CARD <Q> \ o -> R CARD fQ> is defined 
as jV{v) = 



(20) 



2.3. The Optimal Supervision Problem: Formulation & 
Solution 

A supervisor disables a subset of the set of controllable 
transitions and hence there is a bijection between the set 
of all possible supervision policies and the power set 2 V . 
That is, there exists 2 |ia?l possible supervisors and each 
supervisor is uniquely identifiable with a subset of ^ and the 
corresponding language measure -v B allows a quantitative 
comparison of different policies. 

Definition 2.11: For an unsupervised plant G = 
(Q,I,6,TT,x, V) , let G+ and G* be the supervised plants 
with sets of disabled transitions, 3ft c and 3^ c If, 
respectively, whose measures are v+ and v*. Then, the 
supervisor that disables @ ] is defined to be superior to the 
supervisor that disables 3S % if v 1 " ^elementwise ^* and strictly 
superior if V > ElE mentw!se v*. 

Definition 2.12: (Optimal Supervision Problem) Given a 
(non-terminating) plant G = (Q,I,5,TT,x,'^') , the problem 
is to compute a supervisor that disables a subset 3>* C c €, 
such that V^t c ^elementwise ^ where -v* and v+ are the 
measure vectors of the supervised plants G* and G + under 3* 
and 3\ respectively. 

Remark 2.2: The solution to the optimal supervision prob- 
lem is obtained in [12], [7] by designing an optimal policy for 
a terminating plant [14], [13] with a substochastic transition 
probability matrix (1 — 8)TT with 8 e (0,1). To ensure that the 
computed optimal policy coincides with the one for 8 = 0, 
the suggested algorithm chooses a small value for 8 in each 
iteration step of the design algorithm. However, choosing 
8 too small may cause numerical problems in convergence. 
Algorithm \B.2\ (See Appendix \B) computes the critical lower 
bound 8* (i.e., how small a 8 is actually required). In conjunc- 
tion with Algorithm \B.2\ the optimal supervision problem is 
solved by use of Algorithm \B.l\ for a generic PFSA as reported 
in [12][7] 

The following results in Proposition ^. 31 are critical to devel- 
opment in the sequel and hence are presented here without 
proof. The complete proofs are available in [12][7]. 
Proposition 2.3: 1) (Monotonicity) Let -v |k| be the lan- 
guage measure vector computed in the k tH iteration of 
Algorithm IB. J I The measure vectors computed by the 
algorithm form an elementwise non-decreasing sequence, 
i.e., v =elementwise v vk. 

2) (Effectiveness) Algorithm \B.1\ is an effective proce- 
dure [17], i.e., it is guaranteed to terminate. 

3) (Optimality) The supervision policy computed by Algo- 
rithm \B~l\ is optimal in the sense of Definition \2. 12~\ 

4) (Uniqueness) Given an unsupervised plant G, the opti- 
mal supervisor G*, computed by Alsorithm \B. 1\ is unique 
in the sense that it is maximally permissive among all 
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possible supervision policies with optimal performance. 
That is, if 3>* and ^ are the disabled transition sets, 
and v* and V are the language measure vectors for G* 
and an arbitrarily supervised plant G+, respectively, then 

-V* =ELEMENTWISE V => C ^ C 

Definition 2.13: Following Remark \2.2\ we note that Algo- 
rithm [R2\ computes a lower bound for the critical termination 
probability for each iteration of Algorithm \B.1\ such that 
the disabling I enabling decisions for the terminating plant 
coincide with the given non-terminating model. We define 



= mm 

k 



(21) 



where el kl is the termination probability computed by Algo- 
rithm \B~2\ in the k tH iteration of Algorithm \B. l\ 

Definition 2.14: If G and G* are the unsupervised and 
optimally supervised PFSA respectively then we denote the 
renormalized measure of the terminating plant G*(8 min ) as 
y i . 2 L(qO _> (See Definition ^M- Hence, in vector 

notation we have: 



(22) 



where TT* is the transition probability matrix of the supervised 
plant G*. 

Remark 2.3: Referring to Algorithm \B.1\ it is noted that 
-v* = V K| where K is the total number of iterations for 
Algorithm LB.il 

2.4. The Partial Observability Model 

The observation model used in this paper is defined by 
the so-called unobservability maps developed in [10] as a 
generalization of natural projections in discrete event sys- 
tems. It is important to mention that while some authors 
refer to unobservability as the case where no transitions are 
observable in the system; we use the terms "unobservable" 
and "partially observable" interchangeably in the sequel. The 
relevant concepts developed in [10] are enumerated in this 
section for the sake of completeness. 

2.4.1 Assumptions & Notations: We make two key as- 
sumptions: 

• The unobservability situation in the model is specified 
by a bounded memory unobservability map p which is 
available to the supervisor. 

• Unobservable transitions are uncontrollable 
Definition 2.15: An unobservability map p : Q x I* — > Z* 

for a given model G = (Q,£,6,TT,Xi^f) is defined recursively 
as follows: Vqi e Q, <jj el and djw e L(q t ), 



e, if ffj is unobservable from q t 
ffj , otherwise 



(23a) 



p[q t ,CTjtu) =p(q i ,a j )p(6[q i ,a),tu) (23b) 

We can indicate transitions to be unobservable in the graph 
for the automaton G = (Q,Z,5,TT,x, ( ^') as unobservable and 



this would suffice for a complete specification of the un- 
observability map acting on the plant. The assumption of 
bounded memory of the unobservability maps implies that 
although we may need to unfold the automaton graph to 
unambiguously indicate the unobservable transitions; there 
exists a finite unfolding that suffices for our purpose. Such 
unobservability maps were referred to as regular in [10]. 

Remark 2.4: The unobservability maps considered in this 
paper are state based as opposed to being event based observ- 
ability considered in [28]. 

Definition 2.16: A string cu e Z* is called unobservable at 
the supervisory level if at least one of the events in cu is 
unobservable i.e. p(qt, cu] + cu Similarly, a string cu e Z* 
is called completely unobservable if each of the events in 
cu is unobservable i.e. p(qi, cu] = e Also, if there are no 
unobservable strings, we denote the unobservability map p 
as trivial. 

The subsequent analysis requires the notion of the phantom 
automaton introduced in [8]. The following definition is 
included for the sake of completion. 

Definition 2.17: Given a model G = (Q,Z,5,TT,x,^] and 
an unobservability map p, the phantom automaton 3^(G) = 
(Q.I^tSJ./^ffLx./^W) is defined as follows: 



3»(TTXq t ,ffj) = 



Undefined 

ff(qi,cTj 




,if p(q t ,Oj) = e 
, otherwise 

,if p(qi,cTj) = e 
, otherwise 



(24a) 



(24b) 



= (24c) 
Remark 2.5: The phantom automata in the sense ofDefini- 
tion \2.17\ is a finite state machine description of the language 
of completely unobservable strings resulting from the unob- 
servability map p acting on the model G = (Q,Z,5,r[,x,^). 
Note that Ean. (24ci is a consequence of the assumption that 
unobservable transitions are uncontrollable. Thus no transi- 
tion in the phantom automaton is controllable. 
Algorithm IB.3I (See Appendix [Bj computes the transition 
probability matrix for the phantom automaton of a given 
plant G under a specified unobservability map p by deleting 
all observable transitions from G. 

2.4.2 The Petri Net Observer: For a given model G = 
(Q,Z,6,rr,x,'^') and a non-trivial unobservability map p, it is, 
in general, impossible to pinpoint the current state from an 
observed event sequence at the supervisory level. However, 
it is possible to estimate the set of plausible states from a 
knowledge of the phantom automaton 3 s [G). 

Definition 2.18: (Instantaneous State Description :) For a 
given plant G = (Q,Z,5,fT,x 1 ^') initialized at state q 6 Q 
and a non-trivial unobservability map p, the instantaneous 
state description is defined to be the image of an observed 
event sequence cu e Z* under the map Q : p[L[G ]] — > 2 Q as 
follows: 

Q(cu) ={qj e Q : 3s e Z* s.t. 6(q ,s) = q ; /\p(q ,s] = cu} 
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Remark 2.6: Note that for a trivial unobserv ability map p 
with Vtu e Z*, p(cu) = cu, we have Q(tu) = 6(q , w) where q is 
the initial state of the plant. 

The instantaneous state description Q (cu) can be estimated 
on-line by constructing a Petri Net observer with flush-out 
arcs [22] [15]. The advantage of using a Petri net description 
is the compactness of representation and simplicity of the 
on-line execution algorithm that we present next. Our pref- 
erence of a Petri net description over a subset construction 
for finite state machines is motivated by the following: 
The Petri net formalism is natural, due to its ability to 



model transitions of the type qi 



, which reflects the 



condition "the plant can possibly be in states q 2 or q 3 after 
an observed transition from q i One can avoid introducing 
an exponentially large number of "combined states" of the 
form [q2, q3] as involved in the subset construction and more 
importantly preserve the state description of the underlying 
plant. Flush-out arcs were introduced by Gribaudo etal. [15] 
in the context of fluid stochastic Petri nets. We apply this 
notion to ordinary nets with similar meaning: a flush-out 
arc is connected to a labeled transition, which, on firing, 
removes a token from the input place (if the arc weight 
is one). Instantaneous descriptions can be computed on-line 
efficiently due to the following result: 

Proposition 2.4: 1) Algorithm \B.4\ has polynomial com- 
plexity. 

2) Once the Petri net observer has been computed off line, 
the current possible states for any observed sequence can 
be computed by executing Algorithm \B.5\ on-line: 

Proof: Given in [10]. □ 

3. Online Implementation of Measure-theoretic 
Optimal Control under Perfect Observation 

This section devises an online implementation scheme for 
the language measure-theoretic optimal control algorithm 
which will be later extended to handle plants with non-trivial 
unobservability maps. Formally, a supervision policy S for a 
given plant G = (Q.I.S.TT.x,^) specifies the control in the 
terms of disabled controllable transitions at each state qi e Q 
i.e. S = (G, cp) where 



•{0,1} Card(I) 



(25) 



The map 4> is referred to in the literature as the state feed- 
back map [28] and it specifies the set of disabled transitions 
as follows: If at state q t e Q, events 0-^,0^,. are disabled 
by the particular supervision policy, then (p(qi) is a binary 
sequence on {0,1} of length equal to the cardinality of the 
event alphabet Z such that 



and Z uc is the set of uncontrollable transitions, then it suffices 
to consider § as a map cp : Q — » {0,l} Card,IC '. However, 
since we consider controllability to be state dependent (i.e. 
the possibility that an event is controllable if generated at a 
state q t and uncontrollable if generated at some other state 
q^), such a partitioning scheme is not feasible. 
Under perfect observation, a computed supervisor (G,4>) 
responds to the report of a generated event as follows: 

. The current state of the plant model is computed as 
qcurrent = &(<hast, o), where ff is the reported event and 
qiast is the state of the plant model before the event is 
reported. 

. All events specified by <£ (current) is disabled. 

Note that such an approach requires the supervisor to re- 
member cpfqOVqi e Q, which is equivalent to keeping in 
memory a n x m matrix, where n is the number of plant 
states and m is the cardinality of the event alphabet. We 
show that there is a alternative simpler implementation. 

Algorithm 3.1: Online Implementation of Optimal Con- 
trol 

input : G = ( Q , Z, 5, TT, x, >P, Initial state qo 
output: Optimal Control Actions 
l begin 



5 
6 
7 
8 
9 

10 
11 



13 
14 
15 
16 
17 
18 
19 



G opt. 



for all iterations 



Compute G° pt by G 

Set 6** = mint)*; /* Min. 

*/ 

Set u = n G »'.'; 

Set q c -uTrent = qo', I* initial state */ 

while true do /* Infinite Loop */ 

Observe event a; ; /* Perfect Observation */ 

Compute qcuTrent = 6(q c urre7it , tJj ) \ 

for k = 1 to m. do /* m = Cardinality of Z 
*/ 



/* If 
If 



Compute q^ext = 6(q curTeTU t , <r k ); 

if ^current , °" k , Inext] G then 

iTest == qj then n(q T est) = LLj */ 
if n(q T est) ^ ^fqcurrent) then /* 

qcuTTent == qi then Ll( q current ) = M-i 

*/ 

| Disable o k ; 
endif 
else 
| Enable cr k ; 
endif 
endfor 
endw 



20 end 



element 

... o ■ 



Remark 3.1: If it is possible to partition the alphabet Z as 
Z = Z c [J Z uc , where Z c is the set of controllable transitions 



Lemma 3.1: For a given finite state plant G = 
element (Q, Z, 6, TT,x, V) and the corresponding optimal language 
measure v*, the pair (G,v*) completely specifies the optimal 
supervision policy. 

Proof: The optimal configuration G* is characterized as 
follows [12], [7]: 
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if for states q t) e Q, "v*|. > -v*|. 
transitions q t — > q; are disabled. 



then all controllable 



. if for states qt,qj e Q, v*| t ^ -v*|., then all controllable 
transitions q t — > q, are enabled. 

It follows that if the supervisor has access to the unsuper- 
vised plant model G and the language measure vector v*, 
then the optimal policy can be implemented by the following 
procedure: 

1) Compute the current state of the plant model as 
qcurrent = 5(qiast,c), where a is the reported event and 
q i d is the state of the plant model before the event is 
reported. Let q CU rrent = qi- 

2) Disable all controllable transitions qt — >q 1c if"vJ. >vj, 

for all q k e Q. 

This completes the proof. The procedure is summarized in 
Algorithm O □ 
The approach given in Lemma 13.11 is important from the 
perspective that it forms the intuitive basis for extending the 
optimal control algorithm derived under the assumption of 
perfect observation to situations where one or more transi- 
tions are unobservable at the supervisory level. 

4. Optimal Control under Non-trivial 
Unobservability 

This section makes use of the unobservability analy- 
sis presented in Section 12.41 to derive a modified online- 
implementable control algorithm for partially observable 
probabilistic finite state plant models. 

4.1. The Fraction Net Observer 



In Section 12.41 the notion of instantaneous description of 
was introduced as a map Q : p(L(Gi)) — > 2 Q from the set 
of observed event traces to the power set of the state set 
Q, such that given an observed event trace u>, Q(cu) c Q 
is the set of states that the underlying deterministic finite 
state plant can possibly occupy at the given instant. We 
constructed a Petri Net observer (Algorithm lB.4l > and showed 
that the instantaneous description can be computed online 
with polynomial complexity. However, for a plant modeled by 
a probabilistic regular language, the knowledge of the event 
occurrence probabilities allows us not only to compute the set 
of possible current states (i.e. the instantaneous description) 
but also the probabilistic cost of ending up in each state in 
the instantaneous description. To achieve this objective, we 
modify the Petri Net Observer introduced in Section 12.4.21 
by assigning (possibly) fractional weights computed as func- 
tions of the event occurrence probabilities to the input arcs. 
The output arcs are still given unity weights. In the sequel, 
the Petri Net observer with possibly fractional arc weights is 
referred to as the Fraction Net Observer (FNO). First we 
need to formalize the notation for the Fraction Net observer. 



Definition 4.1: Given a finite state terminating plant 
model G B = (Q, I, 5,(1 — Q)W,x^) > and an unobservability 
map p, the Fraction Net observer (FNO), denoted as ^(g 9 ,p), 
is a labelled Petri Net (Q,Z, A 3 , A°,w 3 ,x°) with fractional arc 
weights and possibly fractional markings, where Q is the set 
of places, Z is the event label alphabet, A 3 C Q x I x Q and 
A Q Q x Z are the sets of input and output arcs, w 3 is the 
input weight assignment function and x° e 23 (See Notation 
\2.3\ is the initial marking. The output arcs are defined to 
have unity weights. 

The algorithmic construction of a FNO is derived next. 
We assume that the Petri Net observer has already been 
computed (by Algorithm |B.4| > with Q the set of places, Z the 
set of transition labels, A 3 Q Q x Z x Q the set of input arcs 
and A Q Q x Z the set of output arcs. 

Definition 4.2: The input weight assigning function w 3 : 
A 3 — > (0, oo) for the Fraction Net observer is defined as : 

Vqi e Q.Vffj e Z,Vq k e Q, 



5(qi, a,-) = (\i 



cue£* s.t. 



where 5 : Q x Z — > Q is the transition map of the underlying 
DFSA and p is the given unobservability map and n is 
the event cost (i.e. the occurrence probability) function [29]. 
It follows that the weight on an input arc from transition 
Uj (having an output arc from place q0 to place q k is the 
sum total of the conditional probabilities of all completely 
unobservable paths by which the underlying plant can reach 
the state q k from state qt where qt = 5(qt, crj). 

Computation of the input arc weights for the Fraction Net 
observer requires the notion of the phantom automaton (See 
Definition 12. 17) . The computation of the arc weights for the 
FNO is summarized in Algorithm 14. II 

Proposition 4.1: Given a Petri Net observer (Q,Z,A 3 ,A°), 
the event occurrence probability matrix fc and the transi- 
tion probability matrix for the phantom automaton s {V\), 
Algorithm \4.1\ comoutes the arc weights for the fraction net 
observer as stated in Definition \4.2\ 

Proof: Algorithm 14.11 employs the following identity to 
compute input arc weights: 



Vq t e Q.Vo-j e Z,Vq k e Q, 



w (qi,o-j,q k ) = < 



if (qi,<Tj,q k ) e A 3 A6(qi,crj) = q t 



0, 



otherwise 



which follows from the following argument. Assume that 
for the given unobservability map p, G^ is the phantom 
automaton for the underlying plant G. We observe that the 
measure of the language of all strings initiating from state 
q f and terminating at state q k in the phantom automaton 

. Since every string generated 



G 9 " is given by [l-^(n) 
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Algorithm 4.1: Computation of Arc Weights for FNO 

input : Petri Net Observer ( Q , Z , A 3 , A ) , Event Occurrence 

probability Matrix n, g? (TT) 
output: w 3 , w° 
l begin 

/* Computing Weights for Input Arcs 
for i = 1 tondo 
for j = 1 to m do 
for k = 1 to n do 

if (qi, cij , qk) 6 A 3 then 
Compute q f = 6 ( q i , CTj ) ; 



7 

8 

9 
10 
11 

12 end 



/ 



endif 
endfor 
endfor 
endfor 



by the phantom automaton is completely unobservable (in 
the sense of Definition 12. 17l >. we conclude 



I- (1 -Q)&>(\\) 



Y_ (1 -e) lwl n(q t ,u>) (26) 



u)6£* s.t. 

6*(qt,"j)=qkAp(K. UJ )= e 



This completes the proof. □ 
In the Section [2A2] we presented Algorithm IB. 5 1 to compute 
the Instantaneous State Description Q(u)) online without 
referring to the transition probabilities. The approach con- 
sisted of firing all enabled transitions (in the Petri Net 
observer) labelled by o-j on observing the event o-j in the 
underlying plant. The set of possible current states then 
consisted of all states which corresponded to places with 
one or more tokens. For the Fraction Net observer we use 
a slightly different approach which involves computation of 
a set of event-indexed state transition matrices. 

Definition 4.3: For a Fraction Net observer 
(Q,I, A 3 , A°,w 3 ,x ) the set of event-indexed state transition 
matrices r = {r°"> : o-j e 1} is a set of m matrices each of 
dimension n x n (where m is the cardinality of the event 
alphabet I and n is the number of places), such that on 
observing event o-j in the underlying plant, the updated 
marking x [k+11 for the FNO (due to firing of all enabled 
Oj-labelled transitions in the net) can be obtained from the 
existing marking x M as follows: 

x [k+n =x w ri ( 27) 

The procedure for computing T is presented in Algorithm |4.2| 
Note that the only inputs to the algorithm are the transition 
matrix for the phantom automaton, the unobservability map 
p and the transition map for the underlying plant model. The 
next proposition shows that the algorithm is correct. 

Proposition 4.2: Algorithm \4.2\ correctly computes the set 
of event-indexed transition matrices F = {r ff j : o-j e 1} for a 
given fraction net observer (Q,I, A 3 ,w 3 ,x°) in the sense stated 
in Definition [ 



Algorithm 4.2: Derivation of Transition Matrices f a 

input : .^(TT), 6, p 

output: r CT i Vo-j G L 
l begin 



2 

3 

4 

5 

6 

7 

8 

9 
10 
11 

12 end 



for j e {1 , • • • ,m} do /* m = No. of events */ 
fori £ {1 , • • • ,u} do /* n = No. of places */ 
if 6 (qt, a: ) is undefined OR p (qi, o-j) = e then 

| Set i tH row of P = [0, • ■ ■ ,0] T ; 
else 

Compute r = S ( q j , ctj ) ; 

Set i tH row of P = r tH row of [I - ^(n)]" 1 ; 
endif 
endfor 
endfor 



Proof: Let the current marking of the Fraction Net 
observer specified as (Q,I, A 3 , A°,w°,w 3 ) be denoted by x [kl 
where x [k| e [0,co) n with n = Card(Q). Assume event Oj e Z 
is observed in the underlying plant model. To obtain the 
updated marking of the Fraction Net observer, we need to 
fire all transitions labelled by o-j in the FNO. Since the 
graph of the FNO is identical with the graph of the Petri 
Net observer constructed by Algorithm IB.4I it follows that 
if 6 ( q i , CTj ) is undefined or the event o-j is unobservable from 
the state q t in the underlying plant, then there is a flush-out 
arc to a transition labelled o-j from the place q t in the graph 
of the Fraction Net observer. This implies that the content 
of place q t will be flushed out and hence will not contribute 

[k+l] 



to any place in the updated marking x |k+1] i.e 



Ik] 



= ov i e {i , • 



(28) 



implying that the i th column of the matrix r CT i is [0, • ■ ■ ,0] T . 
This justifies Line 5 of Algorithm I4.2l lf o-j is defined and 
observable from the state q t in the underlying plant, then 
we note that the contents of the place q t end up in all places 
q E e Q such that there exists an input arc ( q i , CTj , q f ) in the 
FNO. Moreover, the contribution to the place qi coming from 
place qi is weighted by w 3 (qi, o-j, q t ). Denote this contribution 
by Cjj. Then we have 

c it =w (qt, o-j, qeJXi 



w^qi.o-j.qtjx™ 



(29) 



Note that ^ i c i( = 



ik+1] 



since contributions from all places 



to qe sum to the value of the updated marking in the place 
q E . Recalling from Proposition I4.ll that 



w 3 (q i ,o- j ,q f )= I-(1 -6)£»(TT) 



(30) 



where q r = Sfq^o-j) in the underlying plant, the result 
follows. □ 
Proposition 14.21 allows an alternate computation of the In- 
stantaneous State Description. We assume that the initial 
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state of the underlying plant is known and hence the initial 
marking for the FNO is assigned as follows: 



AO] 



1 if is the initial state 
otherwise 



(31) 



It is important to note that since the underlying plant is a 
deterministic finite state automata (DFSA) having only one 
initial state, the initial marking of the Fraction Net observer 
has only one place with value 1 and all remaining places are 
empty. It follows from Proposition [4j2] that for a given initial 
marking x 101 of the FNO, the marking after observing a string 
tu = oy. ■ • • ff T where en e I, is obtained as: 



(32) 



Referring to the notation for instantaneous description in- 
troduced in Definition ^. 181 we have 

Q(a)) = {q i eQ:x[ |a ' 11 > 0} (33) 

Remark 4.1: We observe that to solve the State Determi- 
nacy problem, we only need to know if the individual marking 
values are non-zero. The specific values of the entries in 
the marking x |k| however allow us to estimate the cost of 
occupying individual states in the instantaneous description 
QM. 

4.2. State Entanglement Due to Partial Observability 

The markings of the FNO J(g 9 , P ) for the plant G e = 
[Q, 1, 6, (1 — 9)TT,x,'^') in case of perfect observation is of the 
following form: 



Vk e IN, x |k| = [0 ■ ■ ■ 1 ■ • • 0] 1 



e 23 (See Notation 



It follows that for a perfectly observable system, 23 is an 
enumeration of the state set Q in the sense x[ k| = 1 im- 
plies that the current state is q t e Q. Under a non- trivial 
unobservability map p, the set of all possible FNO markings 
proliferates and we can interpret x |k| after the k th observa- 
tion instance as the current states of the observed dynamics. 
This follows from the fact that no previous knowledge beyond 
that of the current FNO marking x [k| is required to define 
the future evolution of x |k| . The effect of partial observation 
can then be interpreted as adding new states to the model 
with each new state a linear combination of the underlying 
states enumerated in 23. 

Drawing an analogy with the phenomenon of state entan- 
glement in quantum mechanics, we refer to 23 as the set of 
pure states; while all other occupancy estimates that may 
appear are referred to as mixed or entangled states. Even 
for a finite state plant model, the cardinality of the set of all 
possible entangled states is not guaranteed to be finite. 

Lemma 4.1: Let J? lGgV] with initial marking -xA^ eUbe the 
FNO for the underlying terminating plant G B = (Q,I,5, (1 - 
Sin.Xi^) with uniform termination probability 0. Then for 
any observed string tu = cr r| ■ • • cr rs of length seN with cr r . e 



I Vtj e {1 , • • • , k}, the occupancy estimate x lk| , after occurrence 
of the k th observable transition, satisfies: 

i Card(Z) 



,Dc] 



0, 



1 



\0 



(34a) 



Proof: Let the initial marking x 101 e 23 be given by 



[0-- 
th element 



1 

) T 



•0] 



(35) 



Elementwise non-negativity of x [kl for all k e N follows from 
the fact that x 101 e 23 is elementwise non-negative and each 
f is a non-negative matrix for all cr e I. We also need 
to show that x [k| cannot be the zero vector. The argument 
is as follows: Assume if possible x lll r a = where x lE| # 
and cr e Z is the current observed event. It follows from the 
construction of the transition matrices that Vq t e Q,x[ f1 # 
implies that either 6(qi,cr) is undefined or ptq^cr) = e. In 
either case, it is impossible to observe the event a with 
the current occupancy estimate x l " which is a contradiction. 
Finally, we need to prove the elementwise upper bound of 
tt on x [w . We note that that x! kl is the sum total of the 
conditional probabilities of all strings u e I* initiating from 
state q t e Q (since Vj,x!°' = 6^) that terminate on the state 
q; e Q and satisfy 



p{u) = tu 



(36) 



It follows that xj kJ S x [01 [I - (1 - ejTT]- 1 1 . since the righthand- 
side is the sum of the conditional probabilities of all strings 
that go to q ; from q t irrespective of observability. Hence we 
conclude: 

1 



H^iix^pi-d-emr'iL^i x 



which completes the proof. 

□ 

Remark 4.2: It follows from Lemma \4.1\ that the entangled 
states belong to a compact subset o/"IR CARD(Q) . 

Definition 4.4: (Entangled State Set:) For a given G = 
(Q,I,6,fl,x,f) and p, the entangled state set c R CAR °iQ)\ 
is the set of all possible markings of the FNO initiated at 
any of the pure states x [01 g 23. 

4.3. An Illustrative Example of State Entanglement 

We consider the plant model as presented in the lefthand 
plate of Figure [2] The finite state plant model with the 
unobservable transition (marked in red dashed) along with 
the constructed Petri net observer is shown in Figure |2] The 
event occurrence probabilities assumed are shown in TableHIl 
and the transition probability matrix P is shown in Table lllTl 



(37) 



Given = 0.01, we apply Algorithm IB.3I to obtain: 



[i-n-e)^(n)] 1 = 



0.2 
1 






The arc weights are then computed for the Fraction Net 
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TABLE II 

Event Occurrence 
Probabilities 



TABLE III 

Transition Probability 
Matrix tt 
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Fig. 2. Underlying plant and Petri Net Observer 



Observer and the result is shown in the righthand plate 
of Figure [2] Note that the arcs in red are the ones with 
fractional weights in this case; all other arc weights are 
unity. The set of transitions matrices T are now computed 
from Algorithm 14.21 as: 
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We consider three different observation sequences rr,Te,Ta 
assuming that the initial state in the underlying plant is 00 
in each case (i.e. the initial marking of the FNO is given by 
a = [1 0] T . The final markings (i.e. the entangled states) 
are given by: 



aPP = 



1.20 " 












0.24 


,aPP = 


0.2 


,«pr a = 











0.2 
















(38) 



Note that while in the case of the Petri Net observer, we could 
only say that Q(rr) = {qi,q2}, for the fraction net observer, 
we have an estimate of the cost of occupying each state (1 .2 
and 0.24 respectively for the first case). 

Next we consider a slightly modified underlying plant with 
the event occurrence probabilities as tabulated in Table [TVl 
The modified plant (denoted as Model 2) is shown in the 
righthand plate of Figure [3] The two models are simulated 
with the initial pure state set to [0 1 0] in each case. We 
note that the number of entangled states in the course of 
simulated operation more than doubles from 106 for Model 
1 to 215 for Model 2 (See Figure EJ. In the simulation, 



TABLE IV 

Event Occurrence Probabilities For Model 2 
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Fig. 3. Underlying models to illustrate effect of unobservability 
on the cardinality of the entangled state set 
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Fig. 4. Total number of distinct entangled states encountered as 
a function of the number of observation ticks i.e. the number of 
observed events 



entangled state vectors were distinguished with a tolerance 
of 10~ 10 on the max norm. 

4.4. Maximization of Integrated Instantaneous Mea- 
sure 

Definition 4.5: Instantaneous Characteristic: Given a 
plant G e = (Q, I, 5,(1 — Q)X\,x^) , the instantaneous char- 
acteristic x(t) is defined as a function of plant operation time 
t e [0, oo ) as follows: 



x(t) = x| t 



(39) 



where q t e Q is the state occupied at time t 

Definition 4.6: Instantaneous Measure For Perfectly Ob- 
servable Plants: Given a plant G B = (Q, 1,6,(1 — 6)TT,x,^) 
, the instantaneous measure fve(t)J is defined as a function 
of plant operation time t e [0,co) as follows: 



v e (t) = <a(t),v B ) 



(40) 
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where a e 23 corresponds to the state that G is observed to 
occupy at time t (Refer to Eq. 4201 1) and v e is the renormalized 
language measure vector for the underlying plant G with 
uniform termination probability Q. 

Next we show that the optimal control algorithms presented 
in Section [3] for perfectly observable situations can be inter- 
preted as maximizing the expectation of the time-integrated 
instantaneous measure for the finite state plant model under 
consideration. 

Proposition 4.3: For the unsupervised plant G = 
(Q,I,6,TT,x, f) with all transitions observable at the 
supervisory level, let G* be the optimally supervised plant 
and G* be obtained by arbitrarily disabling controllable 
transitions. Denoting the instantaneous measures for G* 
and G* by v@(t) and v*(t) for some uniform termination 
probability 8 e (0,1) respectively, we have 



E 



vS(T)dx 



v{j(T)(k) vte [o,oo),ve e (o,i) 
/ 



(41) 



where t is the plant operation time and E(-) denotes the 
expected value of the expression within braces. 

Proof: Assume that the stochastic transition probability 
matrix for an arbitrary finite state plant model be denoted 

1 k "' 

by IT and denote the Cesaro limit as: e(TT) = lim — J~ IT. 



J=0 



Denoting the final stable state probability vector as p\ where 
the plant is assumed to initiate operation in state qi, we 
claim that pj = e(lT)ij which follows immediately from noting 
that if the initiating state is q t then 



(P 1 ) T = 



1 ... 

| i th element 



lim k 



1 y- k -' rri 
k 2-j=o 1 1 



i.e. (p 1 ) 7 is the i tH row of 6(11). Hence, we have 



E 



X(x)dx = 



E(x(t))dT = t(p\ X ) =tv \. (Note: 9 = 0) 



where finite number of states guarantees that the expecta- 
tion operator and the integral can be exchanged Recalling 
that optimal supervision elementwise maximizes the lan- 
guage measure vector v , we conclude 



E 



r(x)dx ^E 



fc'Mdx) Vt e [0,oo) 

D / 



(42) 



where the £(t) for the plant configurations G* and G* is de- 
noted as x* and %* respectively. Noting that the construction 
of the Petri Net observer (Algorithm IB. 4I > implies that in the 
case of perfect observation, each transition leads to exactly 
one place, we conclude that the instantaneous measure is 
given by 

v e (t) = -ve| t where the current state at time t is q t (43) 
Furthermore, we recall from Corollary 12. 11 

e(n)y e = e(n) x =^E (v e (t))=E(x(t)) vte [0,00) (44) 




Fig. 5. Time integrals of instantaneous measure and instanta- 
neous characteristic Vs operation time 



which leads to the following argument: 



E 



rwdx ^E 



E(rW)dx^ 



E(v* e (x))dx^ 



X*(x)dxJ Vte [0,oo) 
E(x*(x))dx Vte [0,oo) 
E(v£(T))dx Vt e [0,oo),V6 e (0,1) 



v*(x)dx ^E 



| v*(x)dxj Vt e [0,oo),V6 e (0,1) 



This completes the proof. □ 
Next we formalize a procedure of implementing an optimal 
supervision policy from a knowledge of the optimal language 
measure vector for the underlying plant. 

4.5. The Optimal Control Algorithm 

For any finite state underlying plant Ge = (Q,Z, 6, (1 — 
0)n>Xi < ^') and a specified unobservability map p, it is possible 
to define a probabilistic transition system as a possibly 
infinite state generalization of PFSA which we denote as the 
entangled transition system corresponding to the underlying 
plant and the specified unobservability map. In defining the 
entangled transition system (Definition ^. 71 ). we use a similar 
formalism as stated in Section 12. H with the exception of 
dropping the last argument for controllability specification 
in Eq. 0. Controllability needs to handled separately to 
address the issues of partial controllability arising as a result 
of partial observation. 

Definition 4.7: (Entangled Transition System:) For a given 
plant G e = (Q,Z,5, (1 — 9)IT, x,^) and an unobservability map 
p, the entangled transition system ^g,p) = (Qjr.I, A,7c,?,x<r) 
is defined as: 

1) The transition map A : Q xI*h Q jr is defined as : 

Voc e Qjr, A(«, cu) = <xY\ r ffi where u> = cr, ■ ■•cr m 

CM 

2) The event generation probabilities fig : Q.$r xl* ^ [0,1] 
are specified as: 

1=CARD(Q) 

7T< ? (a,cr)= Y_ (1 -0)N(ai)7t(qi,ff) 
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3) The characteristic function xs ■ Qj^ - 

as: xeM = («,X> 
Remark 4.3: The definition of fig is consistent in the sense: 

VaeQ^, ^7T^(a,(T) ==^X( ai )(1 -6) = 1 -0 

implying that if Q,^ is finite then «?(g,p) is a perfectly 
observable terminating model with uniform termination 
probability 0. 

Proposition 4.4: The renormalized language measure 
yf (a) for the state a e Qjr of the entangled transition system 
^(g,p) = (Qj?>£, A,7t^,x^) can 6e computed as follows: 



1,1] is defined as a function of plant operation time t e [0,oo) as follows: 



yf (a) = (<x,-v e > 



(45) 



where y B is the language measure vector for the underlying 
terminating plant G e = (Q, 1,6,(1 — 0)TT, x,^) wii/i uniform 
termination probability 0. 

Proof: We first compute the measure of the pure states 
3 c Qjr of <£(G,p) = (Q^,^,^,ng,Xs) denoted by the vector 
yf . Since every string generated by the Phantom automaton 
is completely unobservable, it follows that the measure of 
the empty string £ from any state a e B is given by 
a[I— (1 — 0)^(n)] _1 x. Let a correspond to the state q t 6 Q 
in the underlying plant. Then the measure of the set of all 
strings generated from a e 3 having at least one observable 
transition in the underlying plant is given by 



i-d 



n- ,^(u) 



(46) 



which is simply the measure of the set of all strings of the 
form wiau)2 where p(u)icro> 2 ) = ffp(u> 2 ). It therefore follows 
from the additivity of measures that 



= (1-0) I-(1 



n-^(n) )y 



- -1 

+ I-(1 -0)^(11) x 



i — (i -0) i- (i -0)^>(n) 



^yf = i-(i -em X = ^e 



I- (1 -Q)0»{W) 



X 

(47) 



which implies that for any pure state a e 3, we have yf (a) = 
(a,v e ). The general result then follows from the following 
linear relation arising from the definitions of ng and Xff- 



Vex e 3, Vk e U,yf (ka) = kvjj 



(48) 



This completes the proof. □ 
Definition 4.8: (Instantaneous Characteristic for Entan- 
gled Transition Systems:) Given an underlying plant G e = 
(Q, 1, 5,(1 — 0)11, x,*^) and an unobserv ability map p, the 
instantaneous characteristic %g{\) for the corresponding en- 
tangled transition system S(q, P ) = (Q.sp, Z, A,tx. s ,xs) is defined 



%g{t) = (a(t),x) 



(49) 



where a(t) is the entangled state occupied at time t 

Definition 4.9: (Instantaneous Measure For Partially Ob- 
servable Plants:) Given an underlying plant G e = (Q, I, 5,(1 — 
0)11, Xi^) an d an unobserv ability map p, the instantaneous 
measure (ye(t)) is defined as a function of plant operation 
time t e [0,oo) as follows: 



*e(t) = <a(t),Ttf) 



(50) 



where a e Q is the entangled state at time t and yf is the 
renormalized language measure vector for the corresponding 
entangled transition system <?(g,i>) = (Q& i^.A.ft^ ,x<?) • 

Corollary 4.1: (Corollary to Proposition \4.4) For a given 
plant Ge = (Q,I,6, (1 — 0)n, x,^) an d an unobserv ability map 
p, the instantaneous measure -v e : [0,oo) — > [—1,1] is given by 



v e (t) = («(t),-ve) 



(51) 



where <x(t) is the current state of the entangled transition 
system = (Qj^>£, A,7tg,xg) o.t time t and y e is the 

language measure vector for the underlying plant G. 

Proof: Follows from Definitions 14.91 14.71 and Proposi- 
tion [O] □ 
Proposition 14.41 has a crucial consequence. It follows that 
elementwise maximization of the measure vector y s for the 
underlying plant automatically maximizes the measures of 
each of the entangled states irrespective of the particular un- 
observability map p. This allows us to directly formulate the 
optimal supervision policy for cases where the cardinality of 
the entangled state set is finite. However, before we embark 
upon the construction of such policies, we need to address 
the controllability issues arising due to state entanglement. 
We note that for a given entangled state a e Qgr \ 3, an 
event cr e Z may be controllable from some but not all of the 
states qi G Q that satisfy «i > 0. Thus the notion of control- 
lability introduced in Definition [277] needs to be generalized; 
disabling of a transition cr e Z from an entangled state can 
still change the current state. We formalize the analysis by 
defining a set of event-indexed disabled transition matrices 
by suitably modifying f as follows: 

Definition 4.10: For a given plant G = (Q,I,6,n,x,'^) , the 
event indexed disabled transition matrices Hg is defined as 

6 ij , if a is controllable at q t and p(qi, cr) = cr 
f\j, otherwise 

Evolution of the current entangled state a to «' due to the 
firing of the disabled transition cr e Z is then computed as: 



a! = aCg 



(52) 

Remark 4.4: If an event cr e Z is uncontrollable at every 
state q t e Q, then Pg = V. On the other hand, if event a 
is always controllable (and hence by our assumption always 
observable), then we have Pg = I. In general, we have Cg + 

r a + 1. 
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Proposition ^. 3l shows that optimal supervision in the case of 
perfect observation yields a policy that maximizes the time- 
integral of the instantaneous measure. We now outline a 
procedure (See Algorithm 14.31 1 to maximize Jo"v 9 (T)dT when 
the underlying plant has a non-trivial unobservability map. 



Algorithm 4.3: Optimal Control under Partial Observa- 
tion (Preliminary Procedure For Illustration) 

input : G = (Q, Z, 6, T\,x,^) ,P, Initial State qo for G 

1 begin 

2 while true do /* Infinite Loop */ 

3 Compute the optimal measure vector v* for G 

4 Set the current entangled state to a = q [I— .^(TT)] 

5 if current entangled state is a then 

6 for a £ L do 

7 if (ar ff ,-v 4 ) < (ctr|,-v.) then 

8 | Disable a 

9 endif 

10 endfor 

11 endif 

12 Observe next event a £ L 

13 if a is enabled then 

14 I Update the entangled state to a.r a 

15 else 

16 | Update the entangled state to arg, 

17 endif 

18 endw 

19 end 



Lemma 4.2: Let the following condition be satisfied for a 
plant G = (Q,1, 6,TT,x, and an unobservability map p: 



CARD(Q^) < oo 



(53) 



Then the control actions generated by Algorithm 14.31 is opti- 
mal in the sense that 



Mdi >E 



v* e (z)dt Vte[0,oo),V9e(0,1) (54) 



E 



where y* g (t) and v*(t) are the instantaneous measures at 
time t for control actions generated by Algorithm 14.31 and an 
arbitrary policy respectively. 

Proof: Case 1: First we consider the case where the 
following condition is true: 



Vo-el, (fg, =r)\/(Vae Qjr, aT| = , 



(55) 



which can be paraphrased as follows: 

Each event is either uncontrollable at every state 
q e Q in the underlying plant G e = (Q, 1, 6,(1 — 
9)n>Xi'^) or is controllable at every state at which 
it is observable. 
We note that the entangled transition system qualifies as 
a perfectly observable probabilistic finite state machine (See 
Remark l4.3l > since the unobservability effects have been elim- 
inated by introducing the entangled states. If the above con- 
dition stated in Eq. (53) is true, then no generalization of the 
notion of event controllability in S[g, V ) = (Qj? , £, ^^g,X«) is 



required (See Definition |4.10> . Under this assumption, the 
claim of the lemma then follows from Lemma 13.11 by noting 
that Algorithm 14.31 under the above assumption reduces to 
the procedure stated in Algorithm 13.11 when we view the 
entangled system as a perfectly observable PFSA model. 
Case 2: Next we consider the general scenario where the 
condition in Eq. (53) is relaxed. We note that the key to 
the online implementation result in stated Lemma 13.11 is 
the Monotonicity lemma proved in [12] which states that 
for any given terminating plant G B = (Q, I, 5,(1 — Q)r\,x,%?) 
with uniform termination probability 0, the following iter- 
ation sequence elementwise increases the measure vector 
monotonically: 

1. Compute -v e 

2. If -v e li < "Velj, then disable all events q t qj, 
otherwise enable all events q t A q ; 

3. Go to step 1. 

The proof of the Monotonicity Lemma [12] assumes that 
"disabling" q t A q f replaces it with a self loop at state q t 
labelled cr with the same generation probability; i.e. T\{qi, cr) 
remains unchanged. Now if there exists cr e I with Tg, # I, 
then we need to consider the fact that on disabling cr, the new 
transition is no longer a self loop, but ends up in some other 
state q k e Q. Under this more general situation, we claim 
that Algorithm 14.31 is true; or in other words, we claim that 
the following procedure elementwise increases the measure 
vector monotonically: 

1. Compute -v e 

2. Let q t qj (if enabled) and q L ^> q k (if disabled) 

3. If "v e |i < "Ve lie, then disable q t A q j; otherwise 
enable q t A q s 

4. Go to step 1. 

which is guaranteed by Proposition I A. 1 1 in Appendix [A| Con- 
vergence of this iterative process and the optimality of the 
resulting supervision policy in the sense of Definition 12.121 
can be worked out exactly on similar lines as shown in [12]. 
This completes the proof. □ 
In order to extend the result of Lemma 14.21 to the general 
case where the cardinality of the entangled state set can 
be infinite, we need to introduce a sequence of finite state 
approximations to the potentially infinite state entangled 
transition system. This would allow us to work out the above 
extension as a natural consequence of continuity arguments. 
The finite state approximations are parametrized by n e (0,1] 
which approaches from above as we derive closer and 
closer approximations. The formal definition of such an n- 
Quantized Approximation for <?(g,i>) = (Q^,^, \ftg,Xg) is 
stated next: 

Definition 4.11: (r\-Quantized Approximation:) For a plant 
G e = (Q, 1, 5,(1 — SlTT.x,^) , an unobservability map p and 
a given r\ e (0,1], a probabilistic finite state machine S^ G , = 
(Q^.I.A^.ft^.x,?) qualifies as a r\-quantized approximation 
of the corresponding entangled transition system £{G,y) = 
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[Q&,Z,A,7ig,X<g) if 



A n (a,tu) = C„[A(a,cu)) 



(56) 

where C, n : [0, 1] CARD (Q) — » jg a quantization map satisfy- 
ing: 



CARD(Q^) < oo 

Vae 2, CJa) = a 

Va e Qjr, |K n (a) - alloc ^n 



(57a) 
(57b) 
(57c) 



where is the standard max norm. Furthermore, we denote 
the language measure of the state a e as Vg(a) and the 
measure vector for the pure states a e 2 is denoted as -Vg. 
We note the following: 

1) Foragivenr] e (0,1], there may exist uncountably infinite 
number of distinct probabilistic finite state machines 
that qualify as a n-quantized approximation to <f(G,p) = 
(Q^,L,A,fcg,x<g) ; i.e. the approximation is not unique. 

Tl— -0 + 

3) The compactness of [0, 1] CARD (Q) is crucial in the defini- 
tion. 

4) The set of pure states of <?(g,pi = (Qj^.A^^.x,?) is a 
subset of Q%, i.e., 2 c Q^. 

5) The measure of an arbitrary state a e is given by 

Lemma 4.3: The language measure vector -Vg for the set of 
pure states 2 for any v^-quantized approximation of S{g,v) = 
(Qjs-,1, A,tcs,xs) , is upper semi-continuous w.r.t. r| at r\ = 0. 

Proof: Let M k be a sequence in ]R Card (Q) such that M k |. 
denotes the measure of the expected state after k e N u {0} 
observations for the chosen r| -quantized approximation SJU p] 
beginning from the pure state corresponding to qi e Q. We 
note that: 



Furthermore, we have: 



M, 



= A X [01 



(58) 



(59) 



where A = 6 [I - (1 - 9) ^(IT)] - and x [01 is the perturbation 
of the characteristic vector x due to quantization, implying 
that 

IIMo-AxlL^IIALtf (60) 
Denoting B = [I - (1 - e)^>(TT)] ~ 1 (1 - 6) ( n - ^>(TT) ) , we note: 



M k = B k A x 



IMv 



^IIBtllAlLt, 



It then follows that we have: 



■"Vel 



LIIBII 



|a|Lti 



(61) 



(62) 



We claim that the following bounds are satisfied: 
1) ||A|L S 1 



*<1 



2) Xll B H 
For the first claim, we note 

oo 

[i-(i -e)^(u)]^ =Y_e(\ -e) k ^(n) k 

k=0 

^6(i - e) k n k = e[i- (i -e)n] _1 



5ELEMENTWISE 



The result then follows by noting that 6 [I - (1 - 8)TT] is 
a stochastic matrix for all 6 e (0,1). For the second claim, 
denoting e = [1 ■ ■ ■ 1] T , we conclude from stochasticity of TT: 

(n-^(n))e= [i-5»(n)]e= [i-(i -e)^>{u)]e-e^(u)e 
=> [i-(i -e)^(n)] _1 (n-^(n))e 

= e-^{n)9[I-(1 -8)£»(TT)]~ 1 e 

i 



i 



Be = ^I-9[I-(1 -9)^(17)] \e + Qe 



(63) 



Since B is a non-negative matrix, it follows from Eq. 
that: 



1 



-B 



= 1 — min 



iin{e[i- (1 -e)^(n)] 1 } + 



Noting that 9[l-(1 -9)£»(TT)] 1 = 9 + 9 ^ ((1 - 8)^(n)) k , 



< 1 



LIIBII 



^1=H|B|| <1 



1-d-e) 



Noting that m° b = ~v e and 9 > 0, we conclude from Eq. 

,„ . 1 



vn > o, k; 



(64) 



(65) 



which implies that is upper semi-continuous w.r.t. n at 
ri = 0. This completes the proof. □ 
Lemma 4.4: For any plant G = (Q,Z, b,T\,x,V) with an 
unobserv ability map p: the control actions generated by Al- 
gorithm 14.31 is optimal in the sense that 



y* e {x)dT Vte [0,oo),V9 e (0,1) (66) 



E 



where -Vg(t) and v*[t) are the instantaneous measures at 
time t for control actions generated by Algorithm \4.3\ and an 
arbitrary policy respectively. 

Proof: First, we note that it suffices to consider termi- 
nating plants G e = (Q, I, 6,(1 -9)ff,xX) such that 8 ^ 9 min 
(See Definition HHU for the purpose of defining the optimal 
supervision policy [12]. Algorithm 14.31 specifies the optimal 
control policy for plants with termination probability 8 when 
the set of entangled states is finite (Lemma I4.2fr . We claim 
that the result is true when this finiteness condition stated in 
Eq. H53t is relaxed. The argument is as follows: The optimal 
control policy as stated in Algorithm 14.31 for finite Q,y can 
be paraphrased as 

. Maximize language measure for every state offline 
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• Follow the measure gradient online 

Since CARD(Q^) < oo, it follows from Lemma 14.21 that 
such a policy yields the optimal decisions for an r|-quantized 
approximation of S^g,p) = (Qj^.S, A,7t^,x^) for any n > 0. As 
we approach n = 0, we note that it follows from continuity 
that there exists ti* > such that the sequence of disabling 
decisions do not change for all t) £ ti* implying that the 
optimally controlled transition sequence is identical for all 
T| S it*. Since it is guaranteed by Definition 14.111 that for 
identical transition sequences, quantized entangled states 

after 



at, are within r\ -balls of actual entangled state 
the k th observation, we conclude 



Vk.VTi e (0,Ti*],||af I -a [1c] || oo ^Ti (67) 
It therefore follows that for any control policy, we have 



vn e (0,nJ, 



"v e (x)dx 



OtWl.Vg) - <««l,Tf ) dx ( 1 + 



1 

e 1 



(68) 



implying that J*-vJ](x)dT is semi-continuous from above at 
T| = which completes the proof. □ 
Proposition 4.5: Algorithm \4.4\ correctly implements the 
optimal control policy for an arbitrary finite state plant G = 
(Q,Z,b,W,x,V) with specified unobserv ability map p. 

Proof: We first note that Algorithm 14.41 is a detailed 
restatement of Algorithm 14.31 with the exception of the 
normalization step in Lines 20 and 22. On account of non- 
negativity of any entangled state a and the fact a # (See 
Lemma [4. Il l, we have: 



sign (<x(r - r£)) = sign (>f (a) (r - r§)) 



(69) 



which verifies the the normalization steps. The result then 
follows immediately from Lemma 14.41 □ 
Remark 4.5: The normalization steps in Algorithm \4.4\ 
serve to mitigate numerical problems. Lemma \4.1\ euarantees 
that the entangled state a + 0. However, repeated right mul- 
tiplication by the transition matrices may result in entangled 
states with norms arbitrarily close to leading to numerical 
errors in comparing arbitrarily close floating point numbers. 
Normalization partially remedies this by ensuring that the 
entangled states used for the comparisons are sufficiently 
separated from 0. There is, however, still the issue of approx- 
imability and even with normalization, we may be needed 
to compare arbitrarily close values. The next proposition 
addresses this by showing that, in contrast to MDP based 
models, the optimization algorithm for PFSA is indeed A- 
approximable [19], i.e. deviation from the optimal policy is 
guaranteed to be small for small errors in value comparisons 
in Algorithm \4.4\ This further implies that the optimization 
algorithm is robust under small parametric uncertainties in 
the model as well as to errors arising from finite precision 
arithmetic in digital computer implementations. 



Algorithm 4.4: Optimal Control under Partial Observa- 
tion (Finalized Version) 

input : G = (Q, Z, 5,fr,x, c ^) ,P 
output: Optimal Control Actions 

1 begin /* OFFLINE EXECUTION */ 

Compute -v*; 
Set 9 = Qmi-^; 
Compute M. = [I - (1 - 

5 for <j G L do 

6 Compute r CT ; 

7 Compute Tg; 

8 Compute T 11 = [r a - rg]-v*; /* Column Vector 

*/ 
endfor 

Initialize «o = [0 • • • 1 • • • 0] 
(i h element) f 



htv)3»(TT)] '; 

/* Algorithm [4~2l */ 



; /* Init. state: qi 



*/ 

Compute a = aoM; /* For cu s.t. p(qi,cu) = e 
*l_ 

/* Online Execution */ 



/* Control Action */ 
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while true do 


13 




for a £ L do 


14 






if <xJ a < then 


15 






| Disable a; 


16 






endif 


17 




endfor 


18 




Observe event cr; 


19 




if a is disabled then 


20 








21 




else 


22 






oc= X(ar ff ); 


23 




endif 


24 


endw 


25 end 







Proposition 4.6: (Approximability) In a finite precision im- 
plementation of Algorithm \4.4\ with real numbers distin- 
guished upto A > 0, i.e., 



Va, b e R, |a — b| ^ A 
we have Vt e [O,oo),V0 e (0,1), 



< E 



•Vg(x)dT 



E 



V*(T)dT <A 



(70) 



(71) 



where -Vg (t) and "v* (t) are the instantaneous measures at time 
tfor the exact (i.e. infinite precision) and approximate (upto A- 
precision) implementations of the optimal policy respectively. 

Proof: Let G 9 = (Q, I, 6, (1 - 6)ff,x, V) be the underlying 
plant. First we consider the perfectly observable case, i.e., 
with every transition observable at the supervisory level. 
Denoting the optimal and approximate measure vectors ob- 
tained by Algorithm IB. II as m* b and v*, we claim: 



'e =Elementwise 



(72) 



Using the algebraic structure of the Monotonicity 
Lemma [12] (Also see Lemma [A. It , we obtain: 
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-'V* = 9 [I - (1 — 9)TT*] '(1-9) [TV -F[ # ]v* 

* v ' 

M 

We note that it follows from the exact optimality of v* g that 



"Ve — =Elementwise 



(73) 



Denoting the i th row of the matrix M as M i; we note that 
Mi is of the form ^card(Q) where 



oj ^ n tJ 

h = I'Vek-'Veljl 



(74) 
(75) 



We note that the inequality in Eq. d74l > follows from the 
fact that event enabling and disabling is a redistribution of 
the controllable part of the unsupervised transition matrix 
TT. Also, since TT # , was obtained via A-precision optimization, 
we have: 



Kit > -vSii) A <ii 



\<k~<\i\ 
Wek — I ^ A 



It therefore follows from stochasticity of TT that: 



ilMII 



A = A 



(76a) 
(76b) 

(77) 



Hence , noting that 9 [I — 

IKe-^elL^lki-d-em* 



8)TT*1 



= 1 , we have: 



|1-8|xA<A (78) 



which proves the claim made in Eqn. ( |72| l. It then follows 
from Lemma 13.11 that for the perfectly observable case we 
have Vt e [0,oo),V9 e (0,1), 



< E 



(i)di 



E 



fxldx < A 



(79) 



We recall that for a finite entangled state set Qjt, the 
entangled transition system can be viewed as a perfectly 
observable terminating plant (See Remark I4.3I > with possi- 
bly partial controllability implying that we must apply the 
Generalized Monotonicity Lemma (See Lemma IA.lt . Noting 
that the above argument is almost identically applicable for 
the Generalized Monotonicity Lemma, it follows that the 
above result is true for any non-trivial unobservability map 
on the underlying plant G e satisfying CARD(Qjr) < oo. The 
extension to the general case of infinite Q^then follows 
from the application of the result to n -approximations of the 
entangled transition system for r\ ^ r| t (See Lemma 14.41 for 
explanation of the bound nj and recalling the continuity 
argument stated in Lemma 14.41 This completes the proof. 

□ 

The performance of MDP or POMDP based models is 
computed as the total reward garnered by the agent in the 
course of operation. The analogous notion for PFSA based 
modeling is the expected value of integrated instantaneous 
characteristic JoXMdx (See Definition 14. 5\ as a function of 
operation time. 



Proposition 4.7: (Performance Maximization:) The optimal 
control policy stated in Aleorithm \4.4\ maximizes infinite hori- 
zon performance in the sense of maximizing the expected inte- 
grated instantenous state characteristic (See Definition \4.5i . 



i.e., 



Vt e [0, oo), E ( x*(t)dx ^E x*(t)dx 



(80) 



where the instantenous characteristic, at time t, for the op- 
timal (i.e. as defined by Algorithm \4.4) and an arbitrary 
supervision policy is denoted by x*(t) and x # (t) respectively. 

Proof: We recall that the result is true for the case 
of perfect observation (See Eq. <f42t). Next we recall from 
Remark 14.31 that if the unobservability map is non-trivial, 
but has a finite Qjr, then the entangled transition system 
<f(G, P ) can be viewed as a perfectly observable terminating 
model with uniform termination probability 9. It therfore 
follows, that for such cases, we have: 

Vte [0,oo), E0*fti(T)dx) £E(£ft(T)dT) (81) 

We recall from the definition of entangled transition systems 
(See Definition l4~7l 



X<?(t) = («(t),x) 



(82) 



where oc(t) is the entangled state at time t, which in turn 
implies that we have: 



E( X *) = (E(<x),x) 



(83) 



Since E(a)|i is the expected sum of conditional probabilities 
of strings terminating on state q t of the underlying plant, 
we conclude that E(a) is in fact the stationary state prob- 
ability vector corresponding to the underlying plant. Hence 
it follows that E(x<?) = E(x) implying that for non-trivial 
unobservability maps that guarantee Q,y < oo, we have 



Vt € [0,oo), E 



r(x)dx SiE 



X # (x)dx 



(84) 



The general result for infinite entangled state sets (i.e. 
for unobservability maps which fail to gurantee Qjr < oo) 
follows from applying the above result to li-approximations 
(See Definition 14. 1H of the entangled transition system and 
recalling the continuity result of Lemma 14.41 □ 



4.6. Computational Complexity 

Computation of the supervision policy for an underlying 
plant with a non-trivial unobservability map requires com- 
putation of v, (See Step 2 of Algorithm 14.41 . i.e., we need to 
execute Algorithm lB. ll first. It was conjectured and validated 
via extensive simulation in [12] that Algorithm IB. 11 can be 
executed with polynomial asymptotic runtime complexity. 
Noting that each of the remaining steps of Algorithm 14.41 
can be executed with worst case complexity of n x n matrix 
inversion (where n is the size of the state set Q of the 
underlying model), we conclude that the overall runtime 
complexity of proposed supervision algorithm is polynomial 
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in number of underlying model states. Specifically, we have 
the following result: 

Proposition 4.8: The runtime complexity of the offline por- 
tion of Algorithm \4.4\ (i.e. upto line number 11) is same as 
that of Algorithm LB. 1\ 

Proof: The asymptotic runtime complexity of Algo- 
rithm IB. 11 as shown in [12], is M(u x n) x 0(1) where 
M[n x n) is the complexity ofnxn matrix inversion and 
0(1) is the asymptotic bound on the number of iterations 
on Algorithm IB.ll The proof is completed by noting that 
the complexity of executing lines 3 to 11 of Algorithm 14.41 
is M(n x n). □ 

Remark 4.6: It is immediate that the online portion of 
Algorithm \4.4\ has the runtime complexity of Matrix-Vector 
multiplication. It follows that the measure-theoretic optimiza- 
tion of partially observable plants is no harder to solve that 
those with perfect observation. 

The results of this section establish the following facts: 

1) Decision-theoretic processes modeled in the PFSA 
framework can be efficiently optimized via maximization 
of the corresponding language measure. 

2) The optimization problem for infinite horizon problems 
is shown to be A-approximable, and the solution pro- 
cedure presented in this paper is robust to modeling 
uncertainties and computational approximations. This 
is a significant advantage over POMDP based modeling, 
as discussed in details in Section 11.21 

5. Verification In Simulation Experiments 

The theoretical development of the previous sections is 
next validated on two simple decision problems. 

The first example consists of a four state mission execution 
model. The underlying plant is illustrated in Figure [6] The 
physical interpretation of the states and events is enumer- 
ated in Tables |V| and fVll G is the ground, initial or mission 
abort state. We assume the mission to be important; hence 
abort is assigned a negative characteristic value of —1. M 
represents correct execution and therefore has a positive 
characteristic of 0.5. The mission moves to state E on en- 
countering possible system faults (event d) from states G and 
M. Any further system faults or an attempt to execute the 
next mission step under such error conditions results in a 
transition to the critical state C. The only way to correct the 
situation is to execute fault recovery protocols denoted by r. 
However, execution of r from the correct mission execution 
state M results in an abort. Occurrence of system faults d are 
uncontrollable from every state. Furthermore, under system 
criticality, we have sensor failure resulting in unobservabil- 
ity of further system faults and success of recovery attempts, 
i.e., the events d and r are unobservable from state C. 

The event occurrence probabilities are tabulated in Ta- 
ble |V| We note that the probabilities of successful execution 
of mission steps (event t) and damage recovery protocols 
(event r) are both small under system criticality in state C. 




Fig. 6. Underlying plant model with four states Q = {G , M, E, C} 
and alphabet L = {t, r, d}: unobservable transitions are denoted by 

dashed arrows ( ►); uncontrollable but observable transitions 

are shown dimmed ( ). 



TABLE V 

State Descriptions, Event Occurrence Probabilities & 
Characteristic Values 





Physical Meaning 


t r d 


X 


G 


Ground/Abort 


0.8 0.05 0.15 


-1 .00 


M 


Correct Execution 


0.5 0.30 0.20 


0.50 


E 


System Fault 


0.5 0.20 0.30 


-0.20 


C 


System Critical 


0.1 0.10 0.80 


-0.25 



TABLE VI 

Event Descriptions 





Physical Meaning 


t 


Execution of Next Mission Step/Objective Successful 


r 


Execution of Repair/Damage Recovery Protocol 


d 


System Fault Encountered 



Also, comparison of the event probabilities from states M 
and E reveals that the probability of encountering further 
errors is higher once some error has already occurred and 
the probability of successful repair is smaller. 

We simulate the controlled execution of the above de- 
scribed mission under the following three strategies: 

1) Null controller: No control enforced 

2) Optimal control under perfect observation: Control en- 
forced using Algorithm [3TT] given that all transitions are 
observable at the supervisory level 

3) Optimal control under partial observation: Control en- 
forced using Algorithm 14.41 given the above described 
unobservability map 

The optimal renormalized measure vector of the system 
under full observability is computed to be [—0.0049 —0.0048 — 
0.0049 - 0.0051] T . Hence we observe in Figure [7] that the 
gradient of the instantaneous measure under perfect obser- 
vation converges to around 0.005. We note that the gradient 
for the instantaneous measure under partial observation 
converges close to the former value. The null controller, of 
course, is the significantly poor. 

The performance of the various control strategies are 
compared based on the expected value of the integrated 
instantenous characteristic E (j* x(i)dn)j. The simulated re- 
sults are shown in Figure [5] The null controller performs 
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PFSA framework as shown in the bottom part of Figure [9] 



■Null Controller 

■Optimal Control: Partial Obs 



Optimal Control: Full Obs. 




1000 2000 3000 4000 5000 6000 7000 8000 9000 
Ctoeration Time > 

Fig. 7. Gradient of integrated instantaneous measure as a function 
of operation time 



worst; and the optimal control strategy under perfect ob- 
servation performs best. As expected the strategy in which 
we blindly use the optimal control for perfect observation 
(Algorithm I3.lt under the given non-trivial unobservability 
map is exceedingly poor and close-to-best performance is 
recovered using the optimal control algorithm under partial 
observation. 



100 


-100 
-200 
-300 
-400 
-500 
-600 
-700 
-800 




- Optimal Policy : Perfect Observation 
Optimal Policy : Partial Observation 
— Null Controller 
■ Assuming Perfect Observation 



1000 2000 3000 4000 5000 6000 7000 8000 9000 
Fig. 8. Performance as a function of operation time 

The second example is one that originally appeared in the 
context of POMDPs in [5]. The physical specification of the 
problem is as follows: The player is given a choice between 
opening one of two closed doors; one has a reward in the 
room behind it, the other has a tiger. Entering the latter 
incurrs penalty in the form of bodily injury. The player can 
also choose to listen on the doors; and attempt to figure 
out which room has the tiger. The game resets after each 
play; and the tiger and the reward is randomly placed in 
the rooms at the beginning of each such play. Listening on 
the doors doesnot enable the player to accurately determine 
the location of the tiger; it merely makes her odds better. 
However, listening incurrs a penalty; it costs the player if she 
chooses to listen. The scenario is pictorially illustrated in the 
top part of Figure [9] We model the physical situation in the 




Fig. 9. TOP: Illustration of the physical scenario, BOTTOM: Un- 
derlying plant model with seven states and eight alphabet symbols: 

unobservable transitions are denoted by dashed arrows ( ►); 

uncontrollable but observable transitions are shown dimmed ( ). 



TABLE VII 

State Descriptions 



TABLE VIII 

Event Descriptions 





Physical Meaning 


N 


Game Init 


Tl 


Tiger in 1 


T2 


Tiger in 2 


LI 


Listen: Tiger in 1 


L2 


Listen: Tiger in 2 


T 


Tiger Chosen 


A 


Award Chosen 





Physical Meaning 


si 


Tiger Placed in 1 (unobs. 


S2 


Tiger Placed in 2 (unobs. 


t 


Choose Listen (cont.) 


t c 


Correct Determination 


tl 


Incorrect Determination 


Cl 


Choose 1 (cont.) 


C2 


Choose 2 (cont.) 


n 


Game Reset 



TABLE IX 

Event Occurrence Probabilities & Characteristic 
Values 





X 


St 


S2 


I 


tc 


tl 


Cl 


C2 


TL 


N 


0.00 


0.5 


0.5 




















Tl 


-0.25 








0.33 








0.33 


0.33 





T2 


-0.25 








0.33 








0.33 


0.33 





LI 


-0.75 











0.8 


0.2 











L2 


-0.75 











0.8 


0.2 











T 


-1 .00 























1 


A 


1 .00 























1 



The PFSA has seven states Q = {N , Tl , T2, LI , L2, T, A} 
and eight alphabet symbols I = {si,S2,t,t c ,t I ,c 1 ,C2,Ti}. The 
physical meanings of the states and alphabet symbols are 
enumerated in Tables I VIII and I VIII I respectively. The char- 
acteristic values and the event generation probabilities are 
tabulated in Table [IX] States A and T have characteristics 
of 1 and —1 to reflect award and bodily injury. The listening 
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OPTIMAL CONTROL UNDER PARTIAL OBSERVABILITY 



2 1 I 

CO 

O 

t 71 



-0.995 



50 100 150 200 250 300 350 400 450 500 
OPTIMAL CONTROL UNDER FULL OBSERVABILITY 




50 100 150 200 250 300 350 400 450 500 
OBSERVATION TICKS > 

Fig. 10. Control maps as a function of observation ticks 



states LI and L2 also have negative characteristic (—0.75) in 
accordance with the physical specification. An interesting 
point is the assignment of negative characteristic to the 
states T1 and T2; this prevents the player from choosing to 
disable all controllable moves from those states. Physically, 
this precludes the possibility that the player chooses to not 
play at all and sits in either of those states forever; which 
may turn out to be the optimal course of action if the states 
T1 and T2 are not negatively weighted. 

Figure [TO] illustrates the difference in the event disabling 
patterns resulting from the different strategies. We note that 
the the optimal controller under perfect observation never 
disables event I (event no. 3), since the player never needs 
to listen if she already knows which room has the reward. In 
case of partial observation, the player decides to selectively 
listen to improve her odds. Also, note that the optimal 
policy under partial observation enables events significantly 
more often as compared to the optimal policy under perfect 
observation. The game actually proceeds via different routes 
in the two cases; hence it does not make sense to compare the 
control decisions after a given number of observation ticks; 
and the differences in the event disabling patterns must be 
interpreted only in an overall statistical sense. 

We compare the simuation results in Figures [TlJ and 1121 
We note that in contrast to the first example, the perfor- 
mance obtained for the optimally supervised partially ob- 
servable case is significantly lower compared to the situation 
under full observation. This arises from the physical problem 
at hand; it is clear that it is impossible in this case to 
have comparable performance in the two cases since the 
possibility of incorrect choice is significant and cannot be 
eliminated. The expected entangled state and the station- 
ary probability vector on the underlying model states is 
compared in Figure [13] as an illustration for the result in 
Proposition 14.71 



6. Summary, Conclusions & Future Work 

In this paper we present an alternate framework based on 
probabilistic finite state machines (in the sense of Garg [14], 
[13]) for modeling partially observable decision problems and 
establish key advantages of the proposed approach over the 
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Fig. 11. Gradient of integrated instantaneous measure as a function 
of operation time 
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Fig. 12. Performance as a function of operation time 
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Fig. 13. Comparison of expected entangled state with the stationary 
probability vector on the underlying plant states for the optimal 
policy under partial observation 



current state of art. Namely, we show that, the PFSA frame- 
work results in approximable problems, i.e., small changes 
in the model parameters or small numerical errors result in 
small deviation in the obtained solution. Thus one is guran- 
teed to obtain near optimal implementations of the proposed 
supervision algorithm in a computationally efficient manner. 
This is a significant improvement over the current state of 
art in POMDP analysis; several negative results exist that 
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imply it is impossible to obtain a near-optimal supervision 
policy for arbitrary POMDPs in an efficient manner, unless 
certain complexity classes collapse (See detailed discussion 
in Section II. 2) . The key tool used in this paper is the 
recently reported notion of renormalized measure of proba- 
bilistic regular languages. We extend the measure theoretic 
optimization technique for perfectly observable probabilistic 
finite state automata to obtain an online implementable su- 
pervision policy for finite state underlying plants, for which 
one or more transitions are unobservable at the supervisory 
level. It is further shown that the proposed supervision policy 
maximizes the infinite horizon performance in a sense very 
similar to that generally used in the POMDP framework; 
in the latter the optimal policy maximizes the total reward 
garnered by the plant in the course of operation, while in the 
former, it is shown, that the expected value of the integrated 
instantaneous state characteristic is maximized. Two simple 
decision problems are included as examples to illustrate the 
theoretical development. 



TTfj = n tj - 13,, 
n? k = n ik + (3 tj 



if (Xj < (j. k with |3y > 



Then for the respective measure vectors be v g and v* 



^6 ^ELEMENTWISE V9 € (0,1) (86) 



with equality holding if and only if TT* = TT. 

Proof: From the definition of renormalized measure 
(Definition 12. 9> . we have 



v* - v e = e [i - (i - em*] — 9 [i — (i — e)np x 
= [i-(i-e)TT*] _1 (i-e)(n*-n)v e 

Defining the matrix A = TT* — TT, and the i tH row of A as A i; 
it follows that 



6.1. Future Work 

Future work will address the following areas: 

1) Validation of the proposed algorithm in real-life systems 
with special emphasis of probabilistic robotics, and de- 
tailed comparison with the POMDP based approach with 
respect to computational complexity. 

2) Generalization of the proposed technique to handle un- 
observability maps with unbounded memory; i.e., unob- 
servability maps that result in infinite state phantom 
automata. 

3) Adaptation of the proposed approach to solve finite 
horizon decision problems. 

Appendix A 
Generalized Monotonicity Lemma 

The following proposition is a slight generalization of the 
corresponding result reported in [12] required to handle 
cases where the effect of event disabling is not always a 
self-loop at the current state but produces a pre-specified 
reconfiguration, e.g., 

Disabling q t A q s results in q t A q k 
Note that for every state qt e Q, it is pre-specified where each 
event cr will terminate on been disabled. This generalization 
is critical to address the partial controllability issues arising 
from partial observation at the supervisory level. 

Proposition A. 1: (Monotonicity) Let G = (Q, 1,6,(1 - 
e)ff,xX) be reconfigured to G* = (Q,I,6 # , (1 - &)T? ,x,f) as 
follows: Vi, j,k e {1, 2, ■ • • , n}, the (i, j) tH element TT* and the 
(i, k) th element TT* k of TP are obtained as: 



n* k = n ik - p tj 
n?,=TT tJ 

T* k 



if \ij > (x k with Pij > 
if ft = 



(85) 



(87) 



where 



r« = 



('Vehc-'Velj) ifv e | k >v e |j 
if ^elk = v e |j 

("Velj --Velk) if^elk<"Velj 



Tij ^ Vi, j 



Since Vi, £JL, n tj = Lr=r n * = h it follows from non- 
negativity of TT, that [I - (1 - 8)TT # ]~' ^elementwise 0. Since 
Pij > V i, j, it follows that AiV 6 > Vi => v* g ^elementwise "Ve- 
For v e |j #0 and A as defined above, AiV e = if and only if 
A = 0. Then, TT # = TT and v*=v g . □ 



Appendix B 

Pertinent Algorithms For Measure-theoretic 
Control 



This section enumerates the pertinent algorithms for com- 
puting the optimal supervision policy for a perfectly observ- 
able plant G = (Q,I,6,TT,x, < ^ > ) • For proof of correctness the 
reader is referred to [12]. 



In Algorithm 

i - p + e(TT) 



we use the following notation: M = 

-i" 



Mi = 



inf a# o [I — P + a^ 1 ] 1 Also, as defined earlier, C(TT) is the 
stable probability distribution which can be computed using 



em) 



M 2 = 
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methods reported widely in the literature [33]. 



Algorithm B.l: Computation of Optimal Supervisor 



input : P, x. ^ 

output: Optimal set of disabled transitions 'Si* 
l begin 



5 
6 
7 
8 
9 
10 

11 

12 
13 
14 
15 

16 



17 end 



Set ® [01 = 0, rr [01 = n, el 01 = 0.99, k 

while (Terminate == falsej do 



1; 



Compute 9i 

Set ff [k| = 



M. 



1 ffnc-1]. 



/* Algorithm [B~2| */ 



Compute -v' k l ; 
for j = 1 to n do 
for i=l to n do 



Disable all controllable q^ qj s.t. v' < vj ; 



Enable all controllable q^ qj s.t. v- ^ "v- 
Collect all disabled transitions in 2?' k ' ; 

tf @m == ^[k-n then 

I Terminate = true ; 
else 
|_ k = k + 1 ; 

= ® [k| ; /* Opt imal disabling set */ 



Algorithm B.2: Computation of the Critical Lower 

Bound 9* 

input : P, x 
output: 9* 
1 begin 

Set 9 4 = 1, 9 curT = 0, Compute C(n) , M , M,, M 2 ; 
for j = 1 tondo 
for i = 1 to n do 

if (e(TT)x) t -(e(TT)x)j *0then 

I 9curr = g7^-|(e(n)x)i-(e(n)x)j| 

else 

for r = to n do 

if (MoxH* [MoxJj then 

j Break; 
else 

if (M M^x) i * (M M!,x) j then 
|_ Break; 



15 
16 
17 

18 
19 
20 

21 

22 end 



if 



else 



then 



)(M -e(n))xli-{(M -e(n))x}j 
WmI > 



if r > AND r ^ n then 

1 fl _ (M Mix)i-(MoM 1x li 

I ° curr ~ 2' + J M 2 

else 

I 9curr — 1 1 

minfe,, 9 curr ) ; 



Algorithm B.3: Computation of Phantom Automaton 

input : Q, £, ft, Unobservability map p 

output: ,9>{T\) 
1 begin 



Set ^ = tc; 
for i = 1 fondo 
for j = 1 to m do 

if p ( q i , o~j ) = o-j then 



for i = 1 tondo 
for j = 1 tl do 



/* Delete transition */ 



-k:5 ( q i ,a k | = q j /L ik 



3 s . 



10 end 



Algorithm B.4: Petri Net observer 



input : (G , p) 

output: Petri net observer 

begin 

I. Create a place qj for each state qj in (G,p); 

II. The set of transition labels is Z; 
for each observable transition qj — > q^ in (G,p) do 

I. Set the initial state in (G,p) to qi c ; 

II. Compute Q"(e); 

III. Add a transition labeled a from the place q^ with 
output arcs to all places q 1 e Q ( e ) ; 

for each place q j in the net do 
for each event j€ I do 

if there is no transition with label a from qj then 
|_ I. Add a flush-out arc with label a from qj 



1 

2 
3 

4 
5 

6 
7 

8 
9 
10 
11 



12 end 



Algorithm B.5: Online computation of possible states 
input : Petri net observer, Observed sequence u> = tj t 2 . . . t t 
output: Q ( co ) 

1 begin 

2 I. Compute the initial marking for the observer as follows: 

3 a. Compute Q ( e ) ; 

4 b. Put a token in each place q j £ Q ( e ) ; 

5 for j = 1 to r do 

6 I. Fire all enabled transitions labeled Tj ; 

7 for each place q ; in the observer do 

8 if number of tokens in qj g then 

9 |_ I. Normalize the number of tokens in q j to 1 . 



10 



11 end 



II. Q(u>) = {qj |qj has one token} ; 
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